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The formulation of the decision making of a failure detection 
process as a Bayes sequential decision problem (BSDP) provides 

ion/ 

a simple conceptualization of the decision rule design problem. 

As the optimal Bayes rule is not computable, a methodology that 
is based on the Baysian approach and aimed at a reduced, computa¬ 
tional requirement is developed for designing subopcimal rules. 1 

A numerical algorithm is constructed to facilitate the design and J 
performance evaluation of these subopcimal rules. The result of 1 
applying this design methodology to an example shows that this J 

approach is a useful one. L 
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1. INTRODUCTION 

The failure detection and identification (FDI) 
process involves monitoring the sensor measurements 
or processed measurements known as the residual [1] 
for changes from its normal (no-fail) behavipr. Re¬ 
sidual samples are observed in sequence. If a failure 
is judged to have occurred and sufficient information 
(from the residual) has been gathered, the monitoring 
process is stopped. Then, based on the past obser¬ 
vations of residual, an idencification of the failure 
is made. If no failure has occurred, or if the in¬ 
formation gathered is insufficient, monitoring is not 
interrupted so that further residual samples may be 
observed. The decision to interrupt the residual- 
monitoring to make a failure identification is based 
on a compromise between the speed and accuracy of the 
detection, and the failure identification reflects 
the design tradeoff among the errors in failure clas¬ 
sification. Such a decision mechanism belongs to the 
extensively studied class of sequential tests or se¬ 
quential decision rules. In this paper, we will em¬ 
ploy the Bayesian Approach (2) to design decision 
rules for FDI systems. 

In Section 2, we will describe the Bayes formu¬ 
lation of the FDI decision problem. Although the 
optimal rule Is generally not computable, the struc¬ 
ture of the Bayesian approach can be used to derive 
practical subopcimal rules. We will discuss Che de¬ 
sign o: subopcimal rules based on Che Bayes formula¬ 
tion in Section 3. In Section 4, we will report our 
experience with this approach to designing decision 
rules through a numerical example and simulation. 


2. THE 3AYESIAN APPROACH 

The BSDP formulation of the FDI problem consists 
of six elements: 

1) 0: the set of states of nature or failure 
hypotheses. An element 3 of 0 may denote a single 
type i failure of size v occurring at time r(9» 
(i,x,v)) or the occurrence of a set of failures (pos¬ 
sibly simultaneously), i.e. 9 *{(,v^) i n ,r n , 
v n )}. Due to the infrequent nature of tailure, we 
will focus on Che case of a single failure. 

In many applications it suffices to just identify 
the failure type without estimating the failure size. 
Moreover, it is often true that a detection system 
based on (i,t,"v) fir some appropriate ~ can also de¬ 
tect and identify the type of the failure for 

v>v. Thus, we may use (i,:,v) to represent (i.r). 

In the aircraft sensor FDI problem (3J, for instance, 
excellent results were obtained using this approach. 
Now we have the discrete nature set 


3 - { ( i, t) , i-1.M, 1*1*2 


where we assume there are M different failure tyres 
of interest. 

2) u: the prior probability mass function (?MF) 
over the nature set 3. This PMF represents Che a 
priori information concerning the failure, i.e. how 
likely it is for each tvpe of failure to occur, and 
when is a failure likely to occur. Because this in¬ 
formation may not be available or accurrate in sore 
cases, the need to specify u is a drawback of the 
Bayes approach for such cases. Nevertheless, ve will 
see that ic can be regarded a* a parameter in the de¬ 
sign of a Bayes rule. 

In general, u may be arbitrary. Hero, -e ?. 
the underlying failure orocejs h\>* :-o properties: 
i) the M failures of : are irdopor'k-nt of one mother, 
and ii) the occurrence o: oich failure i is a 
Bernoulli process with . piri'et jr j . . T s e 

Bernoulli process (c >rz .s ..vi *.!• vj : i ■» ion proc¬ 
ess in sont mucus ti-e i * i - -.or 

in physical co-pon*-nt j: :*■- ;n v' a- *.*-,*?. - n 
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describes a large class of failures (such as sensor 
failures) while providing a simple approximation for 
the others. It is straightforward to show that 


u(i f T)*o(i)o(l-o) 


i*l....,M, 


p-i - n (1 -pJ 
j-l 3 

-l ^ -l -i 

o(i)-p (1-0 ) L l Z 0.(1- 0 .) l ] 1 

j«l 3 3 

The parameter p may be regarded as the parameter of 
the combined (Bernoulli) failure process - the oc¬ 
currence of the first failure; o(i)can be interpreted 
as the marginal probability that the first failure 
is of type i. Note that the present choice of u in¬ 
dicates the arrival of the first failure is memory¬ 
less. This property is useful in obtaining time- 
invariant suboptimal decision rules. 

3) P(k): the discrete set of terminal actions 
(failure identifications) available to the decision 
maker when the residual-monitoring is stopped at time 
k. An element 6 of £?(k)may denote the pair (j,c), 
i.e. declaration of a type j failure to have occurred 
at time t. Alternatively, 6 may represent an iden¬ 
tification of the j-th failure type without regard 
for the failure time, or it may signify the presence 
of a failure without specification of its type or 
time, i.e. simply an alarm. Since the purpose of FDI 
is to detect and identify failures that have occurred 
P(k) should only contain identifications that either 
specify failure times at/before k, or do not specify 
any failure time. As a result, the number of ter¬ 
minal decisions specifying failures times grows with 
k while the number of decisions not specifying any 
time will remain the same. In addition, P(k) does 
not include the declaration of no failure, since the 
residual-monitoring is stopped only when a failure 
appears to have occurred. 

4) L(k;9,6): the terminal decision cost func¬ 
tion at cine k. L(k;6,5) denotes the penalcy for 
deciding 5et?(k) at time k when the true state of 
nature is d"(i,T). It is assumed to be bounded and 
non-negative and have the structure: 


fU 

(k;(t,T),6H 


L( (i, t) ,5) x<k, <5cfl(k) 


t>k <5cP(k) 


where 1(9,5) is the underlying cost function that is 
Independent of k; denotes the penalty for a false 
alarm, and it may be generalized to be dependent on 
<$. It is only meaningful for a terminal action 
(identification) that indicates the correct failure 
(and/or time) to receive a lower decision cost chan 
one that indicates the wrong failure (and/or tine). 
Ve further assume that the penalty due to an incor¬ 
rect identification of the failure tine is only de- 
rendenc on the error of such an ident ification. That 
is for £-<j,t), 

L( Ci , t) ,( j , c ) ) * l(i,j ,(c-t) ) 

•ind .'or * with no tire spec i f icat ion 

L. ( ( i , r ) ,:) * L(i,M 

5) r(k) : th\ r.-d k-ensional residual (observa- 

ri.r.'; -c. c -ce. -. u nli \vt p^rCl).r(k) : Ci,t)) 

:n~.r joint c's-.u i c ional donsitv. Assuming 


that the residual is affected by the failure in a 
causal manner, its conditional density has Che prop¬ 
erty 

p(rU).r(k)|(i,0)-p(r(l).r(k)|(0,-)> 

i«l,...,M, x>k 

where (0,-) is used to denote the no-fail condition. 
For Che design of subopzinal rules, we will assume 
that the residual is an independent Gaussian sequence 
with V(mxm matrix) as the time-independent covariance 
function and g^(k-x) as the mean given that the fail¬ 
ure (i»x) has occurred. With the covariance assumed 
to be the same for all failures, the mean function 
g^(k-x), characterizes the effect of the failure 
(i,x), and it Is henceforth called the signature of 
(i,r) (with g^(k-x)*0, for i«0, or x>k). We have 
chosen to study this type of residuals because its 
special structure facilitates the development of in¬ 
sights into the design of decision rules. Moreover, 
the Gaussian assumption is reasonable In many problems 
and has met with success in a wide variety of appli¬ 
cations, e.g., [3] [4]. (It should be noted that the 
use of more general probability densities for the 
residual will not add any conceptual difficulty.) 

6) c(k,(i,x)): the delay cost function having 

the properties; 


c(k,(i,T)) 


c(i,k-x) > 0 t<k 


c(i,k 1 -r)>c(i,k 2 -T) 


k^>k2>t 


After a failure has occurred at i, there is a penalty 
for delaying the terminal decision until tine k>T 
with the penalty an increasing function of the delay 
(k-i). In the absence of a failure, no penalt> is 
imposed on the sampling. In this study ve will con¬ 
sider a delay cost function that is linear in the 
delay, i.e. c(i,k-T)»c(i)(k-x), where c(i) is a posi¬ 
tive function of the failure type i, and may be used 
to provide different delay penalties for different 
types of failures. 

A sequential decision rule naturally consists of 
two parts: a stopping rule (or sampling plan) and a 
terminal decision rule. The stopping jrule, denoted 

by v*(p(0),b(l;r(l>).f(k;r(l),...,r(k)),...) is a 

sequence of functions of the observed residual sam¬ 
ples, with $(k;r(Z),...,r(k))*l, or 0. When 
$(k;r(l),... # r(k))■!, (0), residual-monitoring or 
sampling is stopped (continued) after the k residual 
samples, r(1),...,r(k) are observed. Alternatively, 
the stopping rule may be defined by another sequence 
of functions Y*($(0) v ^(l»r(l)),.•.,ty(k;r(l),•.., 
r(k)),...), where *(k;r(l),...,r(k))-l (0) indicates 
that residual-monitoring has been carried- on up to 
and including time (k-1) and will (not) be stopped 
after time k when residual samples, r(1),.,.,r(k) are 
observed. The functions t and V are related to each 
other in Che following way 

*(k;r(i).r(k>) - :(k;r(l)....,r(k)) . 

k-1 

" [l-:<s,r(l),...,r(s))J 

s «0 

with M0)»i(0). 

The terminal decision rule is a sequence of 

functions, D-(d <0) .d(l;r d)) .d(k;r(l).r(k)>, 

...), mapping -sa-ples, r (lr (k) into 

the terminal action set ?{k) . The function 
<](k;r(l> , .. . ,r(k)) r. : :-«..-nts the decision rule used 
to arrive at an action (icentiiication) if satpl.nc 


yaj* uaw 


' x i l iV r \ * vrf-dXu** 














is stooped at tine k and the residual samples, r(l), 

.. .,r(k) are observed. 

As a result of using the sequential decision 
rule 0,0), given (i,x) is the true state or nature, 
the total expected cost is: 

U a [(l,T),(*,D)l-S S. {’jr(k;r(l) 9 ... ,r(k)) [c(k,(i,x))+ 
U k-0 L,T 

L(k;(i,t),d(k;r(l)., r(k)))]; 


The BSD? is defined as: determine a sequential deci¬ 
sion rule 0*,D*) so that the sequential Bayes risk 
U s is minimized, where 

M • 

’d 3 (*,0)■ EUq[ (i, x) ,($,D)]-E l u(i,t)U 0 [(i,T) ,0,0) I 

i*l x -1 

is called the Bayes Sequential Decision Rule 
(3SDR) with respect to u, and it is optimal in the 
sense that it minimizes the sequential Bayes risk. 

In the following we will discuss an interpreta¬ 
tion of the sequential risk for the FDI problem. Let 
us define the following notation 

T-l 

?rO) a Z E Q jK’*;r(l),... ,r(k)) 
k«l 



3(k,:)*{[r(l) , . .. ,r (k) ] : 

v(k;r(l).r(k) =1 ,d(k,r(1).r(k)>*5}, $zV 


relationships among the various performance issues. 

The advantage of the indirect approach is that only 
the total expected cost instead o? every individual 
performance issue needs to be considered explicitly in 
designing a sequential rule. The drawback of Che ap¬ 
proach, however, lies in the choice of a sec of appro¬ 
priate cost functions (and sometimes the prior distri¬ 
bution) when the physical problem does not have a nat¬ 
ural sec, as it doesn’t in general. In this case, the 
3ayes approach is most useful with the cost functions 
(and the prior distribution) considered as design 
parameters that may be adjusted to obtain an acceptable 
design. 

The optimal terminal decision rule D* can be eas¬ 
ily shown to be a sequence of fixed-sample-size tests 
[2]. The determination of the optimal stopping rule 
is a dynamic programming problem [1). The immense 
storage and computation required make ** impossible to 
compute, and suboptimal rules mst be used. 

Despite Che impractical nature of its solution, 
the 3SDP provides a useful framework for designing 
suboptimal decision rules for the FDI problem because 
of its inherent characteristic of explicitly weighing 
the tradeoffs between detection speed and accuracy (in 
terms of the cost structure). A sequential decision 
rule defines a set of sequential decision regions 
S(k,<5), and the decision regions corresponding to the 
BSDR yield the minimum risk. From this vantage point, 
the design of a suboptimal rule can be viewed as the 
problem of choosing a set of decision regions that 
would yield a reasonably small risk. This is the es¬ 
sence of the approach to suboptimal rule design Chat 
we will describe next. 


?_{3(k,5)[i, t}* f p(r(l),...,r(k)|i,x)dr(l)...dr(k) 
S(k,5) 


3. SUBOPTIMAL RULES 


<k-T)(l-P c (T))“V *(k;c(l) ,... ,r (k)) 

fc«T ‘ ’ 

?('i.O,«>- E ? f {S(k,J)|i,T}(l-? F ) -1 
k=*x 

where P F (x) is the probability of stopping to declare 
a failure before the failure occurs at t, i.e, the 
probability of false alarm when a failure occurs at 
:ime t; V is the sec of terminal actions for all times; 
3(k,5) is the region in Che sample space of Che first 
k residuals where Che sequential rule (b,D) yields the 
terminal decision 3. Clearly, the S(k,<5)’s are dis¬ 
joint secs with respect to both k and 5. The expres¬ 
sions z(i,x) and P((i,t)»5) are the conditional ex¬ 
pected delay in decision (i.e. stopping sampling and 
making a failure identification) and the conditional 
probability of eventually declaring 6, given a type i 
failure has occurred at time x and no false alarm has 
been signalled before this time respectively. 

?((i,x),5) is the generalized cross-detection proba¬ 
bility. Finally, the sequential Bayes risk U g can be 
written as 

M » 

IV z u(l f :)aJ.(:) + {l-?-(T))[c(i)c(i,:) + 

3 11 r r f 

: L((i,r),5)?((i,T),3)]} (1) 

izD 


The Sliding Window Approximation 

The immense computation associated with the BSDR 
is partly due to the increasing number of failure 
hypotheses as time progresses. The remedy for this 
problem is the use of a sliding window to limit the 
number of failure hypotheses to be considered at each 
time. The assumption made under the sliding window 
approximation is chat essentially all failures can be 
detected within W time steps after they have occurred, 
or chat if a failure is not detected within this time 
it will not be detected in the futuru. Here, the win¬ 
dow size W is a design parameter, and it should be 
chosen long enough so that detection and identification 
of failures are possible, but short enough so that 
implementation is feasible [1]. T . ^ 

The sliding window rule (i",d‘) divides the sample 
space of the sliding window of residuals (r(k-V+l), 
...,r(k)}, or equivalently, the space of vectors of 
posterior probabilities, likelihood ratios, or log 
likelihood ratios (L) of the sliding window of failure 
hypotheses into disjoint time-independent sequential 
decision regions (Sp,S^,. . . ,5^;. Because the residuals 
are assumed to be Gaussian variables, it is simpler Co 
work with L (which is related to i by a constant): 

Uk)-(ip(k). 

where 


Equation (1) indicates chat the sequential 3ayes 
risk is a weighted cc.'.j mat ion of the comic ionaL false 
alarm probability, expected delay to decision ind 
or-; ^-detection probao ilities, and the optimal sequen¬ 
ce t 1 r :le minim ice-: such a combination. From 

t : s vantage point, the co-1 functions (L ind c) and 
or’.or i i-:tr ih'.t lap. arovi.de for the weighting, 
;::u if octlv specifying the tradeoff 


L-(k)«[L(k;l, : F),....L(k:M,“)] ' 
l(k;i,-)»~ g'(s)V *rCk-:-s) 

s-0 1 

Then, the sliding window rule states 
kjW, v-' :r-. c he do W on statistics 
dow of residual . I: L ('O : S , 

N, vt- will stop -.impL *.!i ; to do lire ‘ 


(Z) 


At 

ICO 


each time 
frum the win¬ 


ter 
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LOOcSq, and we will proceed to take one more obser- 
vation of the residual. The Bayes design problem is 
to determine a sec of regions. { Sg, S* ,s£} chat min¬ 
imizes the sequential risk U^ f ({S^})« This* represents 
a functional minimization problem for which a solution 
is generally very difficult to determine. A simpler 
alternative to this problem is to constrain the deci¬ 
sion regions to cake on special shapes, {S^(f)}, that 
are parameterized by a fixed dimensional vector, f, 
of design variables. Then Che resulting design pro¬ 
blem involves the determination of a set of parameter 
values f* chat minimizes the risk U^(f). We will 
focus our attention on a special set of parametrized 
sequential decision regions, because they are simple 
and they serve veil to illustrate that the Bayes 
formulation can be exploited, in a systematic fashion, 
to obtain simple suboptimal rules that are capable of 
delivering good performance. These decision regions 


S(j,t)«U(k) : l(k; j ,t)>f(j ,t) , 

e”hj »t)[L(k; j,C)-f (j ,t)J>e -1 Ci,t)(t(k;i,T)-f(i,r>, 
(i.rWj.t)) (3a) 

S(0,-)-{L(k) : L(k;i,T)<f(i,T), 

i*l,...,M, t~ 0,...,W-I) (3b) 

where S(j,c) is Che stop-to-declare (j,k-t) region and 
S(Q,-) is the continue region (see Fig. 1). Generally 
the e's mav be regarded as design parameters, but 
here, s(j,t) is simply taken to be che standard de¬ 
viation of l (k,j, t). 

To evaluate U^f(f), we need_co determine the set 
of probabilities, {Pr{l(k)eS(j,t),l(k-l)eS(0, 
1 (U)cS(0.-)|1,t). k>W, j*0,l,... ,M, t- 0 ,..,,w-l], 
which, indeed, is the goal of many research efforts in 
the so-called level-crossing problem [5], Unfortu¬ 
nately, useful results (bounds and approximations of 
such probabilities) are only available for che scalar 
case [6J,[7],[8]« As it stands, each of the proba¬ 
bilities is an integral of a kMW-dimensional Gaussian 
density over che compound region S(0,-)x...xS(0,-) 
xS(j,t), which, for large kMW, becomes extremely un¬ 
wieldy and difficult to evaluate. 

The MW-dimensional vector of decision statistics 
L(k) corresponds to the MW failure hypotheses, and 
they provide the information necessary for the simul¬ 
taneous identification of both failure type and fail¬ 
ure time. In most applications, such as the aircraft 
3 ensor FDI problem [3] and the detection of freeway 
traffic incidents [4], where the failure time need not 
be explicitly identified, the f Llure time resolution 
power provided by the full window of decision statis¬ 
tics is not needed. Instead, decision rules chat 
employ a few components of l(k) may be used. The 
decision rule of this type considered here consists 
of sequential decision regions that are similar to 
(3) but are only defined in terms of M components of 
L(k): 


: L (k; j ,W-l) >f j 




3 q*' ; Kk.j.k'-lWj j*l.M; (^b) 

where S ^ is the $top-to-decIarc-f'a<lure-j region and 
Sq is tne continue region. Tt should be noted that 
che asfe of (i) is effective if cross-correlat ions of 
sicr.atnms among hypotheses of the failure tvpe 

it different t i?<i -mailer thin those among hypo¬ 


theses o: different failure types. 

The risk for using (4) is 

u *( £ >- l fJ 1 ri+i a ’ T 0J- 

+ z z p(i,T) r Z [c(i)(k-T)+L(io)l 

i*l r*l k*max[W,t] j*l 

x PrfLy^CJOcS , S 0 (k-l)|i,T} 


5o(k)*( e ^Q' # *'' ^ 

The probabilities required for calculating the risk 
are given by the recursion: 

pU^Ck+DlS^kKi.O - 

1/ p(t v _ 1 (k)|S Q (k-l),i,x)dL w _ 1 (k)f 1 

x °/ pOy^Ck+Dl^CVO.SgCk-D.i.T)- 

P (L v 2 1 (k)|S 0 (k-l).i,T)dL._ 1 (k) k»tf (5 

PrOy^CkJeSj, $ 0 (k-l)|t,T} - Pr{S Q (k-l) |i,x} • 


; p(i v _ 1 (k)JS 0 (k-l),i,r)dl w _ 1 (k), J-0,1, 


PrUy^WcSjli.r} - / p(i v _ 1 (W)|i.r)dI w _ 1 (W) (7) 

J s j * 

For M small, numerical integration of (5)-(7) becomes 
manageable. 

Unfortunately, the transition density, 
pCty.xCk+I) |£. v _i(k),5 0 (k-l),i,t), required in (5) is 
difficult to calculate, because Ly_^(k) is not a 
Markov process. In order to facilitate computation 
of the probabilities, we need to approximate the 
transition density. In approximating the required 
transition density for ^’- 1 (k) we are, in fact, ap¬ 
proximating the behavior of A simple approx¬ 

imation is a Gauss-Markov process l(k) that is defined 
by 

£(k+l) * A£(k) + Uk+1) 

EUfk^CO) * BB’u 0 (k-t) 

where A and B are MxM constant matrices and t, is a 
white Gaussian sequence with covariance equal to the 
(MxM) matrix BB’. The reason for choosing this model 
is twofold. Firstly, just as Ly*i(k), Z(k) is 
Gaussian. Secondly, t(k) is Markov so that its tran¬ 
sition density can be readily determined. In order to 
have £(k) behave like we set the matrices A 

and B and the mean of ?. such that 

E. {l(k) }*E. { L rf , (k) } (8) 

1 , T 1 , T w-i. 

E q _:t(V.)i f (k)>E 0f . f U._ 1 (k)L^_ l (k)} (9) 

E 0 _{ ZfkK’ (k*l>} -F.^J (k) L._ 2 (k+1) } (10) 

That is, we have matchei the marginal density and the 
one-step cross-ccvariance of ..(k) to those of l,^(k). 
It ran be shown that (S)-(10) uniquely specify 

A * - 

A ... 

’ -1 

BB 1 a : 0 - 












. H ltt » v . l <k+1» - A EU^Ck)} 


^ c V” l 6* 

1 t-0 C 

V^W^bki^} - ”r g e _ 1 m'\ 


c f 0 G 'V Gc k o* k ' ! ' +1 ' r -° 

:< o* k - w+1 ' T>0 


£ i.r { W k » "< e 


C. » ’ 

Moreover, Che matrix A is stable, i.e. the magnitudes 
of all of the eigenvalues of A are less chan unity, 
and 3 is invertible if Gq or is of rank M. Be¬ 

cause i is an artificial process (i.e. £ is not a 
direct function of the residuals r(k)) i(k) can never 
be implemented for use in (4). 

Ve may choose other Markov approximations of 
L^^Ck) that match xhe n-step cross-covariance (l<n<W) 
instead of catching the one-step cross-covariance as 
in (10), The suitability of a criterion for choosing 
the matrices A and 3, such as (9) and (10), depends 
directly on the failure signatures under consideration 
ar.i may be examined as an issue separate from the 
decision rule design problem. Also, a higher order 
Markov process may be used to approximate ly-l* How¬ 
ever, the increase in the computational complexity 
may negate the benefits of the approximation. 

Mow we can approximate the required probabilities 
in the risk calculation as 

.%■: u-.jOOsSj ,s 0 (k-i) | i t T}s=? r u(k)ss. ,s 0 (k-i) | i.ti 

j»0,1,...,M k>W 


using r(k) we have to augment the signatures as: 

tsj (0).gj(v-l) 1', i*l,...,M. By a proper choice 

of v, the rank of Gq can be increased to M and 3 will 
be invertible. 

Non-Window Sequential Decision Rules 

Here we will describe another simple decision 
rule that has the same decision regions as Che simpli¬ 
fied sliding window rule (4), but the vector (z) of M 
decision statistics is obtained differently as follows 

2 (k+1) * A z(k) +3 r(k+1) (13) 

where A is a constant stable MxM matrix, and 3 is a 
Mxm constant matrix of rank >i. Unlike the Markov 
model t(k) that approximates (k), z(k) is a 

realizable Markov process driven by the residual. The 
advantages of using z as the decision statistic are: 

1) less storage is required, because residual samples 
need not be stored as necessary in the sliding window 
scheme, and 2) since z is Markov, the required proba¬ 
bility integrals are of the form (11) and (12) so that 
the same integration algorithm can be directly applied 
to evaluate such integrals. (tc is possible to use a 
higher order z, but the added complexity will negate 
Che advantages.) 

In order to form the statistics z, we need to 
choose the matrices A and 3. When the failure signa¬ 
tures under consideration are constant biases, E can 
simply be set to equal Cq, and A can be chosen to be 
al, where 0<a<l. Then, the term 3r in (13) resembles 
g»y-l r 0 £ (2), and it- provides the correlation of the 
residual with the signatures. The time constant 
(1/1-a) of z characterizes the memory span of z just 
as W characterizes that of the sliding window rules. 

More generally, if ve consider failure signatures 
that are not constant biases, the choice of A may 
still be handled in the same way as in the constant- 
bias case, but the selection of a B matrix is more 
involved. With some insights into the nature of the 
signatures, a reasonable choice of 3 can often be 
made. To illustrate hov this may be accomplished, we 
will consider an example with two failure modes and an 
m-dimensional residual vector. Let 


?r(l(!<)cSj, S 0 (k-l)|i,r} 

■?r{S 0 (k-l)|i,t) / pU(k) |S Q (k-l) ,i,T)<U(k) (11) 

Sj 

where we have applied the same decision rule to Z(k) 
as L%u.i(<0« Therefore, Sj and Sq(’<- 1) denote the 
decision regions and the event oc continued sampling 
up to time k for both and t. Assuming 

exists, ve have 

? (i(k+l)[S 0 (k),i,T) - [/ p(i(k)[S 0 (k-l),i,t)dy.(k)]' 1 
s ° 

x S s(5(k+l) » U(k+1)-Al(k)|li,r) 
s 0 

?<i(k) :S 0 (k-l),i.r)d?.(k), ic>W (12) 

where ?(5(k)-L,T) is the Caussian density of 4(k) 
ur.;er the failure (i,:). Now the integrals (11) and 
(12) represent more tractable numerical problems. 

In the event that 3 is not invertible, the tran¬ 
sition density is degenerate and (12) is very difficult 
to evaluate. Very often this problem can be circum¬ 
vented by batch processing the residuals. That is, we 
nay consider Che nodi: led residual sequence: r(k) * 

r * •. vk-v fc l) # r ' fvk—/*2).r'(vk)[' for some batch 

s. * : .* *0 with '<*1,2,... as the new time index. In 


g^k-x) - 

g 2 (k-r) * 3 2 (k-t4.1) 

That is, g^ is a constant bias, and g 2 is a ramp. If 
S. and S 0 are not multiples of each other a simple 
choice ot B is available: 


If B.-a^S and 3 ? *ci 7 3» where and a, are scalar con¬ 
stants, the above choice of 3 has rank one and is not 
useful for identifying either signature. Suppose we 
batch process every two residual samples together, i.e. 
we use the residual sequences r(k)*(r r (2k-l),r f (2k)J*, 
k*l,2,.... Then we can set 3 to be 


Tht/s, the first and .ccc:. 
scant-bias and ramp nat-.r 


- rows o: 3 capture cc 

- o 1 a >. r e s ooc t i I y 
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(17) 


(and this B h3s rank. two). The* use of the modified 
resudual r(k) in this case causes no adverse effect, 
since it only lengthens slightly the interval between 
times when terminal decisions nay be made. A big in¬ 
crease in such intervals i.e., the batch processing 
of r(k),...,r(k+v) simultaneously for large v, may 
however, be undesirable. For problems where the 
signatures vary drastically as a function of Che 
elapsed time, or the distinguishability among failures 
depends essentially on these variations, the effec¬ 
tiveness of using z diminishes. In such cases the 
sliding window decision rule should provide better 
performance because of its inherent nature to look 
for a full window’s worth of signature. 

Probability Calculation 

An algorithm based on 1-dimensional Gaussian 
quadrature formulas [9] has been developed to compute 
the probability integrals of (11) and (12) for the 
case M*2. (It can be extended to higher dimension 
with an increase in computation.) The details of this 
quadrature algorithm is described in [1). Its accu¬ 
racy has been assessed via comparison with Monte Carlo 
simulations (see the numerical example). With this 
algorithm we can evaluate the performance probabili¬ 
ties and risks associated with the suboptimal decision 
rules described above. 

Risk Calculation 

In the absence of a failure, the conditional 
density has been observed to essentially reach a 
steady state at some finite time T>W.^ Then, for k_>T 
we have 

Pr{t(k)eS.|S 0 (k-l),0-) =■ b. (14) 

PrU(k)eS ,i(k-l)eS 0 ,-1 (t)sS 0 |S(t- 1) .i.t) » 

b^k-xji) k>x>T (15) 

That is, once steady state is reached, only the rela¬ 
tive time (elapsed time) is important. Generally, 
fialures occur infrequently, and decision rule with 
low false alarm probabilities are employed. Thus, it 
is reasonalbe to assume 1) p<<l ((l-p)^s* i), and 2) 
Pr{5Q(T)|0,-} - 1. The sequential risk associated 
with (4) for M*2 can be approximated by 

V 2 2 - 

u'(f)=P_L_+(l-P_) Z e(i n £ [C(i)e+L(i,j)]b.(t|i)l 

*■> J- 1 ’ (16) 

where 


P „ (l-o)d-bn ) 
r 1-5 (1-0) 

Next, we seek to replace the infinite sum over t 
in (16) by the finite sum up to t*A plus a term ap¬ 
proximating the remainder of the infinite sura. Sup¬ 
pose we have been sampLing for A steps since the fail¬ 
ure occurred. Define: 

P c (jji)*Pr{l(c)sS |S 0 (t-l),i.O} j-0,1,2 

If we stop computing the probabilities after 4, we 
may approximate 


^ Unfortunately, we have not been able to prove 
such convergence behavior using elementary techniques. 
More advanced function-theoretic methods nav be neces- 


P c (j loculi) 3*0.1,2. t>4 

the signature of the failure model is a constant 
(including the no-fail case), the reasoning behind 
(14) holds, and we can see chat ? c (j|i) will reach a 
steady state value as t (the elaspsed time) increases. 
Then, (17) is a valid approxination for a large 4. 

For the case where failure signatures are not constants 
the probability of continuing after a tine steps (for 
sufficiently large t) may be arbitrarily small. The 
error introduced by (17) in the risk (and performance 
probability) calculation is, consequently, small. 
Substituting (17) in (16), we get 

rr 2 2 

U”(f)-P_L P +(l-P r )Z a(i)(c(i)t.+ I L(i*j)P(i*j)] (18) 

s t c F i-i 1 j«i 

where 


ti= j-l t-O* V t,i)+b 0 (i,i> a + 1-P 4 (0li) 
P<l,J) ‘tfo b i (e,i>+b ° (A|l> 


(19) 

( 20 ) 


Pp is the unconditional false alarm probability, i.e. 
Che probability of one false alarm over all time, t^ 
is the conditional expected delay to decision, given 
that a type i failure has occurred, and P(i,j) is the 
conditional probability of declaring a type j failure, 
given that failure i has occurred. From the assumption 
that Pt{Sq(T)| 0,-}~1 and the steady condition (14), it 
can be shown that the-mean time between false alarms is 
simply (l-bg)“^. Now all the probabilities in (18)- 
(20) can be computed by using the quadrature algorithm. 
Note that the risk expression (18) consists only of 
finite sums and it can be evaluated with a reasonable 
amount of computational effort. With such an approx¬ 
imation of the sequential risk, we will be able to 
consider the problem of determining the decision 
regions (the thresholds f) that minimize the risk. 

It should be noted that we could consider choosing 
a set of thresholds that minimize a weighted combina¬ 
tion of certain detection probabilities (P(i,j)), the 
expected detection delay (t.), and the mean time be¬ 
tween false alarms ((1 - . Although such an 

objective function will not result in a Bayesian de¬ 
sign in general, it is a valid design criterion that 
may be useful for some application. 


Risk Minimization 

The risk minimization problem has two features 
that deserve special attention. Firstly, the sequen- 
tail risk is not a simple function of the threshold f, 
and the derivative with respect to f is not readily 
available. Secondly, calculating the risk is a costly 
task. Therefore, the minimum-seeking procedure to be 
used must require few function (risk) evaluations, and 
it must not require derivatives. The sequence-of- 
quadratic-prograns (SQ?) algorithm studied by Winfield 
[10] has been chosen to solve this problem, because it 
does not need any derivative information and it appears 
ro require fewer function evaluations than other well- 
known algorithms [10]. Furthermore, the SOP is simple, 
and it has quadratic ccr.vorgence. Very briefly, the 
algorithm consists of the following. At each iteration 
a quadratic surface is fitted to the risk function 
locally, then Che quadratic model is minimized over a 
constraint region (hence the name SQP). The risk 
function is evaluated at this minimum and is used in 
the surface fitting of the next iteration. The de¬ 
tails of the application of SQP to risk minimization 




L_ - 9 


is reported in [1]. 

4. NUMERICAL EXAMPLE 


L(l,2)-L(2.1)»10 


L(1,1)-L(2,2)=Q 


Here, we will discuss an application of the sub- 
optical rule design methodology described above to a 
numerical example. We will consider the detection 
and identification of two possible failure modes 
(without identifying the failure times). We assume 
that the residual is a 2-dinensional vector, and the 
vector failure signatures, g^(t), i*l,2, as functions 
of the elapsed time t are shown in Table 1. The 
signature of the first failure mode is simply a con¬ 
stant vector. The first component of g2(t) is a con¬ 
stant, while the second component is a ramp. We have 
chosen to exaaine these two types of signature be¬ 
havior (constant bias and ram?) because they are sim¬ 
ple and describe a large variety of failure signatures 
that are commonly seen in practice. For simplicity, 
we have chosen V, the covariance of r, to be Che 
identity matrix. 

We will design both a simplified sliding window 
rale (that uses and a rule using the Markov 

statistic z. The parameters associated with Che 
L, T , 1, and z are shown in Table 2, and the cost 
functions and the prior probabilities are shown in 
Table 3. To facilitate discussions, we will intro¬ 
duce the following terminology. We will refer to a 
Monte Carlo simulation of the sliding window rule by 
SW, a simulation of Che rule using the Markov statis¬ 
tic z as Markov implementation (MI), and a simulation 
of the noniaplamentable decision process using the 
approximation l as Markov approximation (MA). (All 
simulations are based on 10,000 trajectones .) The 
notation Q20 refers to the results of applying the 
quadrature algorithm to the approximation of 
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Table 1. Failure signatures. 
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Table 3. Cost Functions and Prior Probability. 

The results of SW, MA, and Q20 for the thresholds 
[3.85, 12.05] are shown in rigs. 2-6 (see (15) for the 
definition of notations). The quadrature results Q20 
are very close to MA, indicating good accuracy of the 
quadrature algorithm. Tn comparing SW with MA, it is 
evident that the itarkov approximation (MA) slightly 
under-estimates the false alarm r*ce or the sliding 
window rule (SW) . However, Che response of the Markov 
approximation to failures is vary close to that of the 
sliding window rule. In the present example, is 

a 7-th order process, while its approximation 4 is 
only of first order. Tn view of this fact, we can 
conclude that l provides a very reasonable and useful 
approximation of 

The successive choices of thresholds by SOP for 
the sliding window rule are plotted in Fig. 7. Mote 
'that we have not carried the SO? algorithm far enough 
so that the successive choices of thresholds are, say, 
within .001 of each other. This is because towards 
later iterations the performance indices become rela¬ 
tively insensitive to small changes of the f's. This 
together with the fact that we are only computing an 
approximate Bayes risk means that fine scale optimi¬ 
zation is not worthwhile. Therefore, with the approx¬ 
imate risk, the SQP is most efficiently used to locate 
the zone where the minimum lias. That is, Che SQP 
algorithm is to be terminated when it is evident that 
it has converged into a reasonably small,region. Then 
we nay choose the thresholds that give the smallest 
risk as the approximate solution of the minimization. 

In the event chat thresholds that yield the small¬ 
est risk do not provide the desired detection perfor¬ 
mance, the design parameters, L, c, u, and W may be 
adjusted and the SQP may be repeated to get a new de¬ 
sign. A practical alternative method is to make use 
of the list of performance indices (e.g. P(i,j)) that 
are generated in the risk calculation, and choose a 
pair of thresholds that yields the desired performance. 

The performance of the decision rules using 
and z as determined by SQ? are shown in Figs. 3-12. 

(The thresholds for are [8.35, 12.05] and chose 

for z are [6.29, 11.69].) We note chat MI has a 
higher false alarm rate than SW. The speed of detec¬ 
tion for the two rules is similar. While MI has a 
slightly higher type-1 correct detection probability 
than $W, SW has a consistently higher 63Ct(2) (type-2 
correct detection probability) th3n MI. By raising 
the thresholds of the rule using z approprlately, we 
can decrease the false alarm rate of MI down to that 
of SW with an increase in detection delay and slightly 
improved correct detection probability tor the type-2 
failure (with ramp signature). Thus, the sliding 
window rule is slightly superior to the rule using z 
in the sense that when both are designed to yieLd a 
comparable false alarm rate, the latter will have 
longer detection deLays and slightly lower correct 
detection probability (for ty?e-2 failure). In view 
of the fact chat a decision rule using 2 is much 
simpler to implement, it is worthy of being considered 
is an alternative to the slicing window rule. 
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In summary, the result of applying our decision 
rule design method to the present example is very 
good. The quadrature algorithm has been shown to be 
useful, and the Markov approxination of L^i by l is 
a valid one. The SQP algorithm has demonstrated its 
simplicity and usefulness through the numerical exam¬ 
ple. Finally, the Markov decision statistic z has 
been shown to be a worthy alternative to the sliding 
window statistic 

5. CONCLUSION’ 

A methodology based on the Bayesian approach is 
developed for designing suboptiraal sequential deci¬ 
sion rules. This methodology is applied to a numer¬ 
ical example, and the results indicate that it is 
a useful design approach. 
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Fig.l Sequential Decision Regions in 2 Dimensions 
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Fig.2 fc 0 (t/0) - SW, MA, and Q20 








































